{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f9a363cb-8a8e-44d7-837e-35d8a8ed770a",
   "metadata": {},
   "source": [
    "# [WIP] Hyperparameter Optimization for RAG\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/param_optimizer/param_optimizer.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "In this guide we show you how to do hyperparameter optimization for RAG.\n",
    "\n",
    "We use our new, experimental `ParamTuner` class which allows hyperparameter grid search over a RAG function. It comes in two variants:\n",
    "\n",
    "- `ParamTuner`: a naive way for parameter tuning by iterating over all parameters.\n",
    "- `RayTuneParamTuner`: a hyperparameter tuning mechanism powered by [Ray Tune](https://docs.ray.io/en/latest/tune/index.html)\n",
    "\n",
    "The `ParamTuner` can take in any function that outputs a dictionary of values. In this setting we define a function that constructs a basic RAG ingestion pipeline from a set of documents (the Llama 2 paper), runs it over an evaluation dataset, and measures a correctness metric.\n",
    "\n",
    "We investigate tuning the following parameters:\n",
    "\n",
    "- Chunk size\n",
    "- Top k value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a48fefdf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-openai\n",
    "%pip install llama-index-embeddings-openai\n",
    "%pip install llama-index-readers-file\n",
    "%pip install llama-index-experimental-param-tuner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fefd64a3-6223-4e4d-88e8-60e9b52e3fd4",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index llama-hub"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23820c9f-f5a6-4914-9da8-1f1bcc3e21ca",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2023-11-04 00:16:34--  https://arxiv.org/pdf/2307.09288.pdf\n",
      "Resolving arxiv.org (arxiv.org)... 128.84.21.199\n",
      "Connecting to arxiv.org (arxiv.org)|128.84.21.199|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 13661300 (13M) [application/pdf]\n",
      "Saving to: ‘data/llama2.pdf’\n",
      "\n",
      "data/llama2.pdf     100%[===================>]  13.03M   533KB/s    in 36s     \n",
      "\n",
      "2023-11-04 00:17:10 (376 KB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir data && wget --user-agent \"Mozilla\" \"https://arxiv.org/pdf/2307.09288.pdf\" -O \"data/llama2.pdf\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8360ecc9-770f-4f8e-88ac-195478a6dade",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c98cc0d-7dcf-4ed8-baf5-b3fffec035cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "from llama_index.readers.file import PDFReader\n",
    "from llama_index.readers.file import UnstructuredReader\n",
    "from llama_index.readers.file import PyMuPDFReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a321cda3-19ba-4fc9-8301-33d1ebd9afa4",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = PDFReader()\n",
    "docs0 = loader.load_data(file=Path(\"./data/llama2.pdf\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "960ce175-dce1-4a7f-9196-9e0c009e67db",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "\n",
    "doc_text = \"\\n\\n\".join([d.get_content() for d in docs0])\n",
    "docs = [Document(text=doc_text)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb0c05b5-e7ee-4848-9079-c085a21e9f20",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.node_parser import SimpleNodeParser\n",
    "from llama_index.core.schema import IndexNode"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "386890ad-f815-4ad0-9550-40408341f1ed",
   "metadata": {},
   "source": [
    "## Load \"Golden\" Evaluation Dataset\n",
    "\n",
    "Here we setup a \"golden\" evaluation dataset for the llama2 paper.\n",
    "\n",
    "**NOTE**: We pull this in from Dropbox. For details on how to generate a dataset please see our `DatasetGenerator` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff773413-fb47-40ff-b918-2113bc4b8511",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget \"https://www.dropbox.com/scl/fi/fh9vsmmm8vu0j50l3ss38/llama2_eval_qr_dataset.json?rlkey=kkoaez7aqeb4z25gzc06ak6kb&dl=1\" -O data/llama2_eval_qr_dataset.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "995d3cc4-5b4d-494a-9183-5ce4dd336871",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.evaluation import QueryResponseDataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5047413a-924d-4c8a-87f2-8f3da4274b7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# optional\n",
    "eval_dataset = QueryResponseDataset.from_json(\n",
    "    \"data/llama2_eval_qr_dataset.json\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9a492ae-5576-4ec1-bf45-763875f3a0c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_qs = eval_dataset.questions\n",
    "ref_response_strs = [r for (_, r) in eval_dataset.qr_pairs]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e55acabf-b5a9-4e9a-a041-24da13bf68b9",
   "metadata": {},
   "source": [
    "## Define Objective Function + Parameters\n",
    "\n",
    "Here we define function to optimize given the parameters.\n",
    "\n",
    "The function specifically does the following: 1) builds an index from documents, 2) queries index, and runs some basic evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ea8afb1-1186-4799-81f2-4e3a7a7e6a91",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import (\n",
    "    VectorStoreIndex,\n",
    "    load_index_from_storage,\n",
    "    StorageContext,\n",
    ")\n",
    "from llama_index.experimental.param_tuner import ParamTuner\n",
    "from llama_index.core.param_tuner.base import TunedResult, RunResult\n",
    "from llama_index.core.evaluation.eval_utils import (\n",
    "    get_responses,\n",
    "    aget_responses,\n",
    ")\n",
    "from llama_index.core.evaluation import (\n",
    "    SemanticSimilarityEvaluator,\n",
    "    BatchEvalRunner,\n",
    ")\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "\n",
    "import os\n",
    "import numpy as np\n",
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fa6e23c-46cf-4c1f-a5b4-f0b53d8058bd",
   "metadata": {},
   "source": [
    "### Helper Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e48d62ca-fffa-4802-a236-9687bd385584",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _build_index(chunk_size, docs):\n",
    "    index_out_path = f\"./storage_{chunk_size}\"\n",
    "    if not os.path.exists(index_out_path):\n",
    "        Path(index_out_path).mkdir(parents=True, exist_ok=True)\n",
    "        # parse docs\n",
    "        node_parser = SimpleNodeParser.from_defaults(chunk_size=chunk_size)\n",
    "        base_nodes = node_parser.get_nodes_from_documents(docs)\n",
    "\n",
    "        # build index\n",
    "        index = VectorStoreIndex(base_nodes)\n",
    "        # save index to disk\n",
    "        index.storage_context.persist(index_out_path)\n",
    "    else:\n",
    "        # rebuild storage context\n",
    "        storage_context = StorageContext.from_defaults(\n",
    "            persist_dir=index_out_path\n",
    "        )\n",
    "        # load index\n",
    "        index = load_index_from_storage(\n",
    "            storage_context,\n",
    "        )\n",
    "    return index\n",
    "\n",
    "\n",
    "def _get_eval_batch_runner():\n",
    "    evaluator_s = SemanticSimilarityEvaluator(embed_model=OpenAIEmbedding())\n",
    "    eval_batch_runner = BatchEvalRunner(\n",
    "        {\"semantic_similarity\": evaluator_s}, workers=2, show_progress=True\n",
    "    )\n",
    "\n",
    "    return eval_batch_runner"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dee2cb3c-5823-4eda-96f8-cc860edf0884",
   "metadata": {},
   "source": [
    "### Objective Function (Sync)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0244c23-8505-4812-9cca-408d32f2033b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def objective_function(params_dict):\n",
    "    chunk_size = params_dict[\"chunk_size\"]\n",
    "    docs = params_dict[\"docs\"]\n",
    "    top_k = params_dict[\"top_k\"]\n",
    "    eval_qs = params_dict[\"eval_qs\"]\n",
    "    ref_response_strs = params_dict[\"ref_response_strs\"]\n",
    "\n",
    "    # build index\n",
    "    index = _build_index(chunk_size, docs)\n",
    "\n",
    "    # query engine\n",
    "    query_engine = index.as_query_engine(similarity_top_k=top_k)\n",
    "\n",
    "    # get predicted responses\n",
    "    pred_response_objs = get_responses(\n",
    "        eval_qs, query_engine, show_progress=True\n",
    "    )\n",
    "\n",
    "    # run evaluator\n",
    "    # NOTE: can uncomment other evaluators\n",
    "    eval_batch_runner = _get_eval_batch_runner()\n",
    "    eval_results = eval_batch_runner.evaluate_responses(\n",
    "        eval_qs, responses=pred_response_objs, reference=ref_response_strs\n",
    "    )\n",
    "\n",
    "    # get semantic similarity metric\n",
    "    mean_score = np.array(\n",
    "        [r.score for r in eval_results[\"semantic_similarity\"]]\n",
    "    ).mean()\n",
    "\n",
    "    return RunResult(score=mean_score, params=params_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51331345-2e34-4ad9-b4b0-079b52d00353",
   "metadata": {},
   "source": [
    "### Objective Function (Async)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f5c9a6c-f176-409a-9a2d-80315a225725",
   "metadata": {},
   "outputs": [],
   "source": [
    "async def aobjective_function(params_dict):\n",
    "    chunk_size = params_dict[\"chunk_size\"]\n",
    "    docs = params_dict[\"docs\"]\n",
    "    top_k = params_dict[\"top_k\"]\n",
    "    eval_qs = params_dict[\"eval_qs\"]\n",
    "    ref_response_strs = params_dict[\"ref_response_strs\"]\n",
    "\n",
    "    # build index\n",
    "    index = _build_index(chunk_size, docs)\n",
    "\n",
    "    # query engine\n",
    "    query_engine = index.as_query_engine(similarity_top_k=top_k)\n",
    "\n",
    "    # get predicted responses\n",
    "    pred_response_objs = await aget_responses(\n",
    "        eval_qs, query_engine, show_progress=True\n",
    "    )\n",
    "\n",
    "    # run evaluator\n",
    "    # NOTE: can uncomment other evaluators\n",
    "    eval_batch_runner = _get_eval_batch_runner()\n",
    "    eval_results = await eval_batch_runner.aevaluate_responses(\n",
    "        eval_qs, responses=pred_response_objs, reference=ref_response_strs\n",
    "    )\n",
    "\n",
    "    # get semantic similarity metric\n",
    "    mean_score = np.array(\n",
    "        [r.score for r in eval_results[\"semantic_similarity\"]]\n",
    "    ).mean()\n",
    "\n",
    "    return RunResult(score=mean_score, params=params_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1391168-65a9-4445-ace8-e8c0403083f0",
   "metadata": {},
   "source": [
    "### Parameters\n",
    "\n",
    "We define both the parameters to grid-search over `param_dict` and fixed parameters `fixed_param_dict`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d5b93bbb-12a9-48df-b621-0bd9260ee154",
   "metadata": {},
   "outputs": [],
   "source": [
    "param_dict = {\"chunk_size\": [256, 512, 1024], \"top_k\": [1, 2, 5]}\n",
    "# param_dict = {\n",
    "#     \"chunk_size\": [256],\n",
    "#     \"top_k\": [1]\n",
    "# }\n",
    "fixed_param_dict = {\n",
    "    \"docs\": docs,\n",
    "    \"eval_qs\": eval_qs[:10],\n",
    "    \"ref_response_strs\": ref_response_strs[:10],\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2b7b6fd-4e1e-4f95-8639-1d7b5b117f46",
   "metadata": {},
   "source": [
    "## Run ParamTuner (default)\n",
    "\n",
    "Here we run our default param tuner, which iterates through all hyperparameter combinations either synchronously or in async."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49cdb04f-979a-4a19-b038-14d5c4e3f80a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.experimental.param_tuner import ParamTuner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2dae8a4-f3ed-404f-9564-a19319765f20",
   "metadata": {},
   "outputs": [],
   "source": [
    "param_tuner = ParamTuner(\n",
    "    param_fn=objective_function,\n",
    "    param_dict=param_dict,\n",
    "    fixed_param_dict=fixed_param_dict,\n",
    "    show_progress=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84c69236-a3be-45c8-894c-f6f6d321254b",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = param_tuner.tune()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a71de43-ebfe-4b32-8d3d-f587c02b7a57",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 0.9490885841089257\n",
      "Top-k: 2\n",
      "Chunk size: 512\n"
     ]
    }
   ],
   "source": [
    "best_result = results.best_run_result\n",
    "best_top_k = results.best_run_result.params[\"top_k\"]\n",
    "best_chunk_size = results.best_run_result.params[\"chunk_size\"]\n",
    "print(f\"Score: {best_result.score}\")\n",
    "print(f\"Top-k: {best_top_k}\")\n",
    "print(f\"Chunk size: {best_chunk_size}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4556a80-69f1-42b2-b295-d1f55faefa8d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.9263373628377412, 1, 256)"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# adjust test_idx for additional testing\n",
    "test_idx = 6\n",
    "p = results.run_results[test_idx].params\n",
    "(results.run_results[test_idx].score, p[\"top_k\"], p[\"chunk_size\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f435e13-1a55-4711-aeca-a1faae8fbdf0",
   "metadata": {},
   "source": [
    "### Run ParamTuner (Async)\n",
    "\n",
    "Run the async version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83a95b40-a6a2-42d0-859c-d9fd2a2b8226",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.experimental.param_tuner import AsyncParamTuner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dba9783-0d4f-40c8-b4a5-2fa01a3ffde8",
   "metadata": {},
   "outputs": [],
   "source": [
    "aparam_tuner = AsyncParamTuner(\n",
    "    aparam_fn=aobjective_function,\n",
    "    param_dict=param_dict,\n",
    "    fixed_param_dict=fixed_param_dict,\n",
    "    num_workers=2,\n",
    "    show_progress=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4789a7e0-4253-44d7-9548-71d1e1212e8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = await aparam_tuner.atune()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ed6a696-4c2a-4c42-a400-cf53f50e82a3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 0.9521222054806685\n",
      "Top-k: 2\n",
      "Chunk size: 512\n"
     ]
    }
   ],
   "source": [
    "best_result = results.best_run_result\n",
    "best_top_k = results.best_run_result.params[\"top_k\"]\n",
    "best_chunk_size = results.best_run_result.params[\"chunk_size\"]\n",
    "print(f\"Score: {best_result.score}\")\n",
    "print(f\"Top-k: {best_top_k}\")\n",
    "print(f\"Chunk size: {best_chunk_size}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d5cac7e-c405-484a-9edb-c90b1e16d01c",
   "metadata": {},
   "source": [
    "## Run ParamTuner (Ray Tune)\n",
    "\n",
    "Here we run our tuner powered by [Ray Tune](https://docs.ray.io/en/latest/tune/index.html), a library for scalable hyperparameter tuning.\n",
    "\n",
    "In the notebook we run it locally, but you can run this on a cluster as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03cd4a6e-a2b3-4ff5-a352-13bc3056333c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.experimental.param_tuner import RayTuneParamTuner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4846aa64-7519-4470-ac9c-fa819e1dc56f",
   "metadata": {},
   "outputs": [],
   "source": [
    "param_tuner = RayTuneParamTuner(\n",
    "    param_fn=objective_function,\n",
    "    param_dict=param_dict,\n",
    "    fixed_param_dict=fixed_param_dict,\n",
    "    run_config_dict={\"storage_path\": \"/tmp/custom/ray_tune\", \"name\": \"my_exp\"},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e34a84be-8aac-4a08-ae55-40feda76e089",
   "metadata": {},
   "outputs": [],
   "source": [
    "results = param_tuner.tune()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a04c01c7-837e-4a06-be36-34e2b0761738",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['docs', 'eval_qs', 'ref_response_strs', 'chunk_size', 'top_k'])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results.best_run_result.params.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d899ba9d-d7ab-4401-a4a6-152f0e01869f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results.best_idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c904b1f2-a66b-4540-821a-062c94ff439f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 0.9486126773392092\n",
      "Top-k: 2\n",
      "Chunk size: 512\n"
     ]
    }
   ],
   "source": [
    "best_result = results.best_run_result\n",
    "\n",
    "best_top_k = results.best_run_result.params[\"top_k\"]\n",
    "best_chunk_size = results.best_run_result.params[\"chunk_size\"]\n",
    "print(f\"Score: {best_result.score}\")\n",
    "print(f\"Top-k: {best_top_k}\")\n",
    "print(f\"Chunk size: {best_chunk_size}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
